54 research outputs found

    Saliency Guided End-to-End Learning for Weakly Supervised Object Detection

    Full text link
    Weakly supervised object detection (WSOD), which is the problem of learning detectors using only image-level labels, has been attracting more and more interest. However, this problem is quite challenging due to the lack of location supervision. To address this issue, this paper integrates saliency into a deep architecture, in which the location in- formation is explored both explicitly and implicitly. Specifically, we select highly confident object pro- posals under the guidance of class-specific saliency maps. The location information, together with semantic and saliency information, of the selected proposals are then used to explicitly supervise the network by imposing two additional losses. Meanwhile, a saliency prediction sub-network is built in the architecture. The prediction results are used to implicitly guide the localization procedure. The entire network is trained end-to-end. Experiments on PASCAL VOC demonstrate that our approach outperforms all state-of-the-arts.Comment: Accepted to appear in IJCAI 201

    Learning Intra and Inter-Camera Invariance for Isolated Camera Supervised Person Re-identification

    Full text link
    Supervised person re-identification assumes that a person has images captured under multiple cameras. However when cameras are placed in distance, a person rarely appears in more than one camera. This paper thus studies person re-ID under such isolated camera supervised (ISCS) setting. Instead of trying to generate fake cross-camera features like previous methods, we explore a novel perspective by making efficient use of the variation in training data. Under ISCS setting, a person only has limited images from a single camera, so the camera bias becomes a critical issue confounding ID discrimination. Cross-camera images are prone to being recognized as different IDs simply by camera style. To eliminate the confounding effect of camera bias, we propose to learn both intra- and inter-camera invariance under a unified framework. First, we construct style-consistent environments via clustering, and perform prototypical contrastive learning within each environment. Meanwhile, strongly augmented images are contrasted with original prototypes to enforce intra-camera augmentation invariance. For inter-camera invariance, we further design a much improved variant of multi-camera negative loss that optimizes the distance of multi-level negatives. The resulting model learns to be invariant to both subtle and severe style variation within and cross-camera. On multiple benchmarks, we conduct extensive experiments and validate the effectiveness and superiority of the proposed method. Code will be available at https://github.com/Terminator8758/IICI.Comment: ACM MultiMedia 202

    Prototypical Contrastive Learning-based CLIP Fine-tuning for Object Re-identification

    Full text link
    This work aims to adapt large-scale pre-trained vision-language models, such as contrastive language-image pretraining (CLIP), to enhance the performance of object reidentification (Re-ID) across various supervision settings. Although prompt learning has enabled a recent work named CLIP-ReID to achieve promising performance, the underlying mechanisms and the necessity of prompt learning remain unclear due to the absence of semantic labels in ReID tasks. In this work, we first analyze the role prompt learning in CLIP-ReID and identify its limitations. Based on our investigations, we propose a simple yet effective approach to adapt CLIP for supervised object Re-ID. Our approach directly fine-tunes the image encoder of CLIP using a prototypical contrastive learning (PCL) loss, eliminating the need for prompt learning. Experimental results on both person and vehicle Re-ID datasets demonstrate the competitiveness of our method compared to CLIP-ReID. Furthermore, we extend our PCL-based CLIP fine-tuning approach to unsupervised scenarios, where we achieve state-of-the art performance

    Transformer Based Multi-Grained Features for Unsupervised Person Re-Identification

    Full text link
    Multi-grained features extracted from convolutional neural networks (CNNs) have demonstrated their strong discrimination ability in supervised person re-identification (Re-ID) tasks. Inspired by them, this work investigates the way of extracting multi-grained features from a pure transformer network to address the unsupervised Re-ID problem that is label-free but much more challenging. To this end, we build a dual-branch network architecture based upon a modified Vision Transformer (ViT). The local tokens output in each branch are reshaped and then uniformly partitioned into multiple stripes to generate part-level features, while the global tokens of two branches are averaged to produce a global feature. Further, based upon offline-online associated camera-aware proxies (O2CAP) that is a top-performing unsupervised Re-ID method, we define offline and online contrastive learning losses with respect to both global and part-level features to conduct unsupervised learning. Extensive experiments on three person Re-ID datasets show that the proposed method outperforms state-of-the-art unsupervised methods by a considerable margin, greatly mitigating the gap to supervised counterparts. Code will be available soon at https://github.com/RikoLi/WACV23-workshop-TMGF.Comment: Accepted by WACVW 2023, 3rd Workshop on Real-World Surveillance: Applications and Challenge

    Promoting Public Participation in Post-Disaster Construction through Wechat Platform

    Get PDF
    Purpose - How could memory, heritage and post-disaster construction integrated in practice? The purpose of this paper is to introduce our approach in public participation of reconstruction plan, after a raging fire destroyed part of the historic town of Shangri-la, China. Approach – We develop two kind of crowd sourcing platform to collect and also present memory of the vanishing streets which were distroyed completely by fire. One is on Wechat platform. Through secondary development on Wechat Platform, we built a public service account that allowed users to upload photos, hand-painted pictures, and text, all of the files can be saved automaticly in our database. The other platform in on web. The website is designed for users to upload photos based on location where they were taken. All the images collected from the two platforms can be open accessed viewed with location information, which had been sort out by volunteers. The wechat platform is also used to communicate and provide education and information of the historic town to promote awareness of the heritage value. Users can send text to the public account, without privacy risk. Findings – Spreading with help from a local non-government organization, the invations of the wechat public service account received amazing amount of attention, which, according to automatic web statistics, reached up to 40,000. About 150 people followed the Wechat public account. At last we received nearly 1000 photos and hand-painted pictures. About half of our users are from Shangrila local community, Their uploaded files including historical photos of the community, providing us local perspective with long period of concern. The other half users come from travlers from all over the world, mostly from China but also european people. Their photos and paintings also contribute to the memory construction. Implications –The widespread use of smart mobile devices can make individuals more active as participants of public fairs, with the premise of carefully designed infrastructure. In this way, new technologies may contribute to a people centred principle in our conservation and design process. Value – Our approach is so-called Volunteered Geographic Information(VGI)(Goodchild,2007) in collecting memory fragments for post-disaster construction. By convenience of uploading photos and texts from mobile devices, we successfully involved local people and travlers‘participation. The case might bring insight into the field of public participation practice

    Camera-aware Proxies for Unsupervised Person Re-Identification

    Full text link
    This paper tackles the purely unsupervised person re-identification (Re-ID) problem that requires no annotations. Some previous methods adopt clustering techniques to generate pseudo labels and use the produced labels to train Re-ID models progressively. These methods are relatively simple but effective. However, most clustering-based methods take each cluster as a pseudo identity class, neglecting the large intra-ID variance caused mainly by the change of camera views. To address this issue, we propose to split each single cluster into multiple proxies and each proxy represents the instances coming from the same camera. These camera-aware proxies enable us to deal with large intra-ID variance and generate more reliable pseudo labels for learning. Based on the camera-aware proxies, we design both intra- and inter-camera contrastive learning components for our Re-ID model to effectively learn the ID discrimination ability within and across cameras. Meanwhile, a proxy-balanced sampling strategy is also designed, which facilitates our learning further. Extensive experiments on three large-scale Re-ID datasets show that our proposed approach outperforms most unsupervised methods by a significant margin. Especially, on the challenging MSMT17 dataset, we gain 14.3%14.3\% Rank-1 and 10.2%10.2\% mAP improvements when compared to the second place. Code is available at: \texttt{https://github.com/Terminator8758/CAP-master}.Comment: Accepted to AAAI 2021. Code is available at: https://github.com/Terminator8758/CAP-maste
    • …
    corecore